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A BAYESIAN STATISTICAL APPROACH FOR 
INFERENCE ON STATIC ORIGIN-DESTINATION 

MATRICES 

By Luis Carvalho* 

Boston University 

We address the problem of static OD matrix estimation from a 
formal statistical viewpoint. We adopt a novel Bayesian framework 
to develop a class of models that explicitly cast trip configurations 
in the study region as random variables. As a consequence, classical 
solutions from growth factor, gravity, and maximum entropy mod- 
els are identified to specific estimators under the proposed models. 
We show that each of these solutions usually account for only a small 
fraction of the posterior probability mass in the ensemble and we then 
contend that the uncertainty in the inference should be propagated 
to later analyses or next-stage models. We also propose alternative, 
more robust estimators and devise Markov chain Monte Carlo sam- 
pling schemes to obtain them and perform other types of inference. 
We present several examples showcasing the proposed models and ap- 
proach, and highlight how other sources of data can be incorporated 
in the model and inference in a principled, non-heuristic way. 



1. Introduction. Consider a study region divided into n zones where 
trips can occur between any pair of zones. During a certain time period we 
observe the number of trips originated at zone i, Oi, and the number of trips 
destined to zone j, Dj, for i,j = 1, . . . ,n. Our objective is to estimate the 
number of trips Tjj from each zone i to each zone j — including intrazonal 
trips Tii — conditional on the O = {Oi}^^^ and V = {Dj}^^^. Since the trips 
T = {Tij}ij=i^,„^n can be represented by the matrix 

Til Ti2 ■ ■ ■ Tin 
T21 T22 ■ ■ ■ T2n 

Tnl Tn2 ' ' ' Tnn 



(LI) M 



and we are fixing a time window for the trip realizations, our problem is 
usually referred to as static OD matrix estimation. We note that the OD 



'Supported by NSF grant DMS- 1107067. 

Keywords and phrases: static OD matrix estimation, random matrix, constrained sam- 
pling 



1 



2 



L. CARVALHO 



matrix M has restrictions on its row and column margins 



n 



i = 1, . . . ,n 



(1.2) 



n 



Dj, j = l,... 



n. 



i=l 



and thus the estimation is constrained. We also require that XlILi ^« ~ 
'^^=1 Dj = T for consistency. It should be immediate from this formulation 
that static OD matrix estimation is a contingency table problem[2]; our goal 
here is to provide a broader treatment from a more applied perspective. 

This problem has been studied for many decades in the transportation 
literature. The first contributions to its solution adopted a physical inter- 
pretation and assumed T could be described by a gravitational law [1]: 
Tij (X OiDjd^^ , where dij is the distance between zones i and j. This func- 
tional relation was later generalized to include decreasing functions of trav- 
eling costs Cij between zones i and j, called "deterrence" functions: 



Common choices for d include exponential linear functions of costs, such as 
d{cij) = exp{—/3cij) or d{cij) = exp(— /3cij — alogCjj). These gravity models 
are synthetic models since they do not incorporate previously observed trip 
patterns. In contrast, growth factor models regard T as possible future trip 
patterns and incorporate previous observations in a doubly constrained for- 
mulation. Let the "seed" matrix To = {tij}i,j=i,...,n be previous observations 
from the same or similar study region. Based on the method proposed by 
Furness [4], we assume 



where Ai and Bj are "balancing factors" that are known up to a propor- 
tionality constant. Furness method defines T by iteratively solving for the 
balancing factors to respect constraints (1.2) until convergence. 

Both gravity and growth factor models provide estimates for T based on 
heuristic, functional arguments. Wilson [13, 14] defined a formulation based 
on entropy maximization that would unify both previous approaches. If 



(1.3) 



Tij (X OiDjd{cij). 



(1.4) 




T! 
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is the number of "micro" states associated with "meso" state T, then the 
trip configuration that maximizes W, or equivalently 



subject to constraints (1.2) is a maximum entropy solution. If instead of 
log W we maximize 



the solution would coincide with the one provided by the Furness model. By 
adding an additional cost constraint, such as 



we obtain the same estimates from the gravity model with d{cij) = exp(— /3cjj). 

We can make two important observations from the maximum entropy ap- 
proach. First, we note that the functional expressions for Tjj from the gravity 
and Furness models can actually be regarded as closed form expressions that 
can be used to iteratively obtain solutions to a mathematical program that 
maximizes log W or log W' subject to certain constraints. Second, since there 
are many feasible configurations for T, we can define weights — in Wilson's 
case given by W — to help us find the best trip configuration; it is, however, 
implicit from this formulation that any other trip pattern but the "optimal" 
is also possible, or even likely, to occur. 

In this paper we propose a formulation for the OD matrix estimation 
problem where T is explicitly random. As we will show, this formulation cor- 
responds to a Bayesian statistical approach, e.g. [5]. Even though our focus 
will be on exploring the randomness associated with the trip patterns instead 
of simply extracting a single trip pattern through optimization, we show that 
the maximum entropy solutions, including the classical gravity and growth 
model solutions, are identified with maximum a posteriori (MAP) estimates 
under our setup. Besides this unifying consequence, Bayesian methods also 
provide other types of estimators and, more generally, are able to quantify 
the uncertainty in estimation and to propagate it to posterior analyses in a 
principled, integrated framework. 



log W{T) - log T! « - ^ log - Tij 






(1.5) 
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2. Proposed Model. Let us say that the trips T are (O, V)- consistent, 
denoted by T € C{0,'D), if T satisfies equations (1.2). That is, we define 

!n n ^ 

f = {fij} : Tij = Oi and T-,- = DA. 
i=i i=i ) 

As stated before, we regard T as random while margin trips O and T> are 
observed data. As usual in the fully Bayesian approach we pursue next, all 
inferences are driven by the posterior distribution on T conditional on data 
O and V, 

Let us then consider the simple likelihood 
(2.1) V{0,V\T) = I[T eC{0,V)] 

where /(•) is the indicator function: I{A) = 1 if and only if A is true. By 
the definition of OD consistency, the likelihood in equation (2.1) just states 
that the margin trips satisfy equations (1.2), that is, it is a simple indicator 
for (O, P)-consistency. 

The randomness in trips T comes initially from our belief, before observ- 
ing any data in the margins, of how the trips are distributed. This belief is 
hardly subjective, but often arises from experience on similar regions and 
zones; in the next section we discuss how to incorporate knowledge gathered 
from small scale studies in the same region. To establish a parallel to the 
maximum entropy approach of the previous section, we assume that T has 
a conditional multinomial prior distribution given by T] T ~ MN(T, p), that 
is, 

lli.i 4 „■ 



where T is the total number of trips in the region and p = {pij}ij=i,...,n 
with pij being the proportion of trips between zones i and j. Of course, we 
require that ^ pij = 1 and pij are nonnegative. The hyper-prior parameter 
T has an improper non-informative distribution P(T) oc 1, and so the prior 
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becomes 

oo 

p(r) = J^p(r|r)p(r) 



T=0 

CXD 



(2.2) = E n^n^'S^^ = ^ 



The prior on T resembles the number of micro states W defined by Wil- 
son, but with the proportions as extra parameters. The proportions p have 
the important role of convening prior information on the structure of trip dis- 
tribution in the study area. From a behavioral perspective, pij corresponds 
to the probability of a trip in the system, out of the total T available, occur- 
ring between zones i and j; we could, for example, borrowing from random 
decision theory, define a multinomial logit model on each pij that depends 
on a set of covariates for each OD pair such as transport costs, time, and 
user preferences: 

_ exp(x^/3) 

Efc,/=i,...,nexp(x^;/3)' 

where /3 are known coefficients. 

While we are now assuming that p is known and thus fully specifies P(T) 
above, we can further incorporate uncertainty by adding another level of ran- 
domness to the prior parameters to form a hierarchical model; we postpone 
such considerations to Section 3. 

2.1. Estimation. The inference we wish to carry out is driven by our 
updated belief in T after observing O and D as summarized by the posterior 
distribution 



P{T\0,V) 

(2.3) 



_ p(0,p|r)p(r) 

^ I[T gC{0,V)]F{T) 
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One important consequence of T € C{0, T>) in the posterior above is that 
the prior parameter T imphcitly satisfies 

n n 

(2.4) T = Y,Tij = Y.O^ = T.^^^ 

i,j i=l j=l 

that is, O and 2? are self-consistent through T. 

A common estimator for T is the maximum a posteriori (MAP) estimator, 
the posterior mode: 



r = argmax|logP(r|0,P)| 

argmax < Y^Tijlogpij -logTijl > 
r£C{0,v) [ J 

rgmax <^ Tij logpij - (Tij logT^ - Tij) \ 
-.c(o,v) [ , . J 



ar^ 

TeCiO; 

= arg max < - ( Tij log ^ - Ti- 
TGCiCV) [ ^ \ Pij 

Note the similarity between the maximand and log W . It is now straight- 
forward to show that 

T^j — jA{Oi]3j DjPij ^ 

where Ai and Bj are balancing factors. Thus, the MAP estimator is equiv- 
alent to the solution obtained from the Furness method for the maximum 
entropy formulation. In fact, if we use a prior seed matrix To = {tij} to 
set Pij = tij / Ylii jUj-, the prior proportions, we recover the growth factor 
solution. 

To obtain gravity model solutions we just have to define p based on 
an entropy maximizing principle: we want p that maximizes the entropy 
'H{p) = —YlijPij^'^SPij possibly subject to additional constraints on p 
other than YlijPij — ^- Since entropy uniquely measures the amount of 
uncertainty in a probability distribution, a maximum entropy assignment 
is justified as the only unbiased assumption we can attain under a state of 
partial knowledge of the system. As Wilson [13, pg. 10] points out, "the prob- 
ability distribution which maximizes entropy makes the weakest assumption 
which is consistent with what is known" . If we then constraint on trip costs 
by requiring a fixed mean cost in the region 



(2.5) ^ ] ^ijPij ~ 
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we obtain pij cx exp(— /3cjj), and hence a gravity model with a famihar 
exponential deterrence function. 

Even though setting p as above provides the same solution, there is a sub- 
tle but important difference to the original maximum entropy formulation: 
in Wilson's model we constraint the trip patterns using (1.5), effectively 
reducing the number of feasible trip configurations, while in our proposed 
model we only restrict the proportions using (2.5) to redefine the weights 
on trip patterns. In other words, our feasible space is still only constrained 
by (1.2), but we set the proportions as a structural guide for estimation 
since the shape of the posterior distribution on T depends on p. In this 
sense, we can think of (2.5) as a "soft" constraint. We can argue that such 
a formulation is more natural since we can certainly have prior knowledge 
of overall transport expenditures in the system while it seems artificial to 
establish a rigid cost constraint on the whole study region. 

Another good estimator is the posterior mean, defined as 

T = E[T\0,V]=^f-Pif\0,V). 

f 

The posterior mean is more "robust" than the posterior mode since it av- 
erages the uncertainty on trip patterns across all possible T — weighted by 
their respective posterior probability mass — as opposed to simply picking 
the trip pattern with highest posterior probability. Moreover, since the pos- 
terior mean is a linear combination of feasible trip patterns, it also satisfies 
the linear constraints in (1.2). There is, however, one major difficulty in this 
venue: we need to know P(T| 0,V) for each T. 

The main hurdle in evaluating the posterior on T in (2.3) is the normal- 
izing factor Z{0, V) = "^^f^zc^o v) ^O')- Computing Z{0^ V) requires sum- 
ming over all possible pairwise trip assignments that are (O, D)-consistent, a 
daunting task. Before addressing this central issue, we offer some motivation 
in the next subsection. 

2.2. A simple example. Suppose that, for n = 2 zones, we observe Oi, 
O2, Di, D2, and wish to estimate the entries T in the OD matrix 



Til 


T12 


Oi 


T21 


T22 


O2 


Di 


D2 


T 



with margins and total number of trips T displayed. 

Since T is consistent, we know that T12 = Oi — Tn, r2i = Di — Tn 
and T22 = O2 — Til = Tn — (T — O2 — D2) = Tn — A, where we set 
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A = T — O2 — D2. The posterior on T is then a posterior on Tn due to these 
hnear constraints: 

P(Tn I 0,V) a ^' . pff pf^^P^f P^f 



/^ll /^12 /^21 ^^22 



(2.6) rn!(Oi - Tn)!(Di - Tn)!(rn - A)! 



= i7(Tn;0i,I)i,A,^), 

where V = (piij522)/(l'i2P2i) can be interpreted as a intra-interzonal odds 
ratio. Since L>i — A = T — Oi, we can see that Tn foUows a non-central 
hypergeometric distribution[ll]: 

Tu\0,V ^EG{Oi,DuA;iP). 

Note that T € C{0,V) is equivalent to requiring that max{0,A} < 
Til < niin{Oi, -Di}, and so the normahzing constant for (2.6) is the sum 
of its right-hand side over the values of Tu above. In practice, however, it 
is simpler to obtain posterior samples of Tu using a Metropolis- Hastings 
algorithm [9, 7, 8]. 

As proposal we adopt a random walk: given our actual position at 
iteration t—1, we set our candidate T^^ a step to the left, Tf^ = T^l ^^-1 with 

probability 0.5 or a step to the right, T^^ = + 1 with probability 0.5. If 

Tfi < max{0. A} or Tf^ > mm{Oi, Di} we immediately reject Tj*^ — and set 
T^i = T^i — as it is out of bounds. Otherwise we accept T^^ — and thus set 
= T^i— with probability mm{R{Til~^\T*^), 1}, where R{Til~^\T*^) is 
the acceptance ratio 

_ H{T^,;Ol,Dl,A,^^;) 
H{t[{ '^Oi,Z)i,A,V) 

We denote this Metropolis step by 

tS =MS{tI{-^^;Oi,Di,A,^). 

To summarize, we can obtain samples from Tu by doing: 
Step 1. Start at some arbitrary initial T^f^ . 
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Step 2. For t = 1,2, . . . do (until convergence): execute a Metropolis step, 

Ti'^ =MS{Ti{-'^;Oi,D,,A,i;), 

that is, 

Step 2.1. Sample candidate sample U ~ C/(0, 1); if C/ < 0.5 set 
Ti*i = t[{~^^ - 1, otherwise set T^^ = T^'^^ + 1. 

Step 2.2. If Tj^i < max{0,A} or T^^ > min{Oi,L>i} set T^f = 
Ty^ ^'^ (reject). Otherwise, sample U ~ C/(0, 1): if f/ < 
m.in{R(T^\ ^\t^-^),1} then set T^^f = Tj*^ (accept), else 
set T}f = T^"^^ (reject). 

A numerical example should help us further gain intuition on the problem. 

Example 1. Let Oi = 40, O2 = 40, Di = 60, D2 = 20, pu = 0.1, pi2 = 
0.2, P21 = 0.3, and P22 = 0.4. It follows that T = Oi + O2 = Di + D2 = 80, 
A = r - O2 - ^2 = 20, and = {P11P22) / {P12P21) = (0.1 • 0.4)/(0.2 • 0.3) = 
2/3, and so Tn ~ HG(40, 60, 20; 2/3). 

Using random walk Metropolis samples T^l^ , ■ ■ ■ , T^f^ we can produce 
point estimates for Tn if desired: the posterior mean, 

- 1 

Tu = E[Tu\0,V]^-Y,T^f\ 



9=1 



and the posterior mode, 



Tn = argmax P(Tii = x\0,V). 

x=max{0,A},...,min{Oi,Z)i} 

Tn can be obtained from estimates for P(Tii | O,!?), by Monte Carlo simu- 
lation, 

1 ^ 

(2.7) P(Tn = X I O, P) ^ - ^ liTif = x), 

9=1 

or from the Furness method. Using G = 10,000, we obtain Tn = 28.43 and 
Tn = 28.49, and so both the posterior mean and posterior mode, estimated 
from our samples and rounded to the nearest feasible integer, are ~ 28. 
It is not uncommon for both estimates to coincide, especially when the 
distribution is unimodal and close to symmetric, as in this case. 
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Interestingly, P(Tii = 28\0,T>) ~ 0.20; even for this simple example with 
a small number of trips we can see that the probability of the most probable 
trip configuration corresponds to a small fraction of possible configurations. 
This effect should not come as a surprise: as the number of zones and margins 
grow, so do the number of possible consistent configurations, and so the 
probability of any single trip configuration becomes even smaller. 

We have previously remarked on the structural role of the proportions p, 
serving as a guide when searching for a representative trip pattern among 
the many possible feasible configurations. We note, however, that there is 
no principled reason to expect a close relation between p and actual propor- 
tions T/T since the latter is constrained by origin and destination margins. 
As an example, consider Figure 1, where we show the marginal posterior 
distributions of Tn, T12, T21, and T22, along with expected "structural" 
number of trips given by Tp. The discrepancies are clear once we observe 
that Tpii + Tpi2 = 24 < 40 = Oi and similarly for the other margins; 
equivalently, (Tn + T\2)IT = 0.5 > 0.3 = pu + P12 for any (feasible) trip 
pattern T. 



i 



Til T12 T21 J22 



Fig 1. Estimated posterior distributions of T from 10,000 samples. Squares mark expected 
structural trips. 

2.3. Posterior sampler. Let us now extend the results from the last sec- 
tion to our problem. In general, for n zones we have the following OD matrix 
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with margins displayed: 



Til 


Tu ■ 


Tin 


Oi 




T22 • 


T2n 


02 




Tn2 • 


■ T 


On 


Di 


D2 ■ 


■ Dn 


T 



We now proceed to eliminate the first n — 1 entries in the last row and 
column by means of the linear constraints in the margins: 



(2i 



7,7 



Oi 



n-l 

i=l 
n-l 



J 



1, 



1, 



i = 1, . . . , n — 1. 

The corner entry Tnn requires special handling: 



n-l 



(2.9) 




Ultimately, Tnn stems from the symmetry in equation (2.4). 

To sample from the entries in the {n — l)-by-(n — 1) upper submatrix 
S we adopt a Gibbs sampler [6]; see also [7, 8]. The conditional posterior 
distributions are P(Tjj | T^^jj, 0,T>), for i,j = 1, . . . ,n — l, where Tfjj] denotes 
ah the entries in T but Tij, that is, Tfj^] = {Tki}k,i=i,...,n-i,k^i,lj^j- The only 
terms in F{T \ 0,T>) that depend on Tij are now related to Tin and Tnj 
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through equations (2.8) and to r„„ through equation (2.9). Namely, 



F{Tij\T[i^],0,V) 



oc 



t^ij t^in t'nj fnn 

rP^3 ~'^i3 

Pin fin Pni P^n 



(2.10) ^ Pij Pin fnj 

Tijli^Oij TijY'{Dij Tij^\(Tij Z\jj)! 
V / V ^« j -^ij / 

where we define Oij = O—Y^i^^^ ^^_^i^^Tiu Dij = Dj-J2k=i,...,n-i,kjtiTkj, 
Aij = A - T.k,i=i,...,n-i,k^i,i^j'^ki, and i^ij = [pijPnn) / {pinPnj)—a. "within- 
between" odds trip ratio — to simphfy the expressions. Thus, 

(2.11) Tij I %] , O, P ~ nG{Oij , Dij , A,y ; ) . 

It is now straightforward to sample from the posterior for T using a hy- 
brid Metropolis-within-Gibbs sampling scheme since we know how to sample 
from the non-central hypergeometric: 

Step 1. Start at some arbitrary initial configuration T^^^ ■ 
Step 2. For t = 1, 2, . . . do (until convergence): 

Step 2.1. For i, j = 1, . . . ,n — 1 do: sample T-j^ ~ Tij \ Tj^^*.j ^\o,V 
in (2.11) using a Metropolis step, 

with ofj , ofj , Afj , and ipij defined as above. Note 

that all the parameters but tpij depend on Tj^^*.j and so 
carry an iteration index. 

It should be noted that this sampling scheme is similar to the more general 
scheme from algebraic statistics and based on Markov basis [3]. 

Example 2. We end this section with an example taken from [12, 
pg. 179]. The costs {cij} between four zones are listed in Table 1, along 
with observed origin and destination margins. 

Let us now assume that pij oc exp(— /3cjj) with /3 = 0.10. After running 
our Gibbs sampler until assumed convergence, we take G = 10,000 samples 
to perform posterior inference; the marginal posterior distributions for Tij 
in the upper 3-by-3 matrix are summarized in Figure 2. 
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Table 1 

Trip costs between four zones with observed origin and destination margins. Reproduced 

from [12, table 5.8]. 



Zone 


1 


2 


3 


4 


o, 


1 


3 


11 


18 


22 


400 


2 


12 


3 


13 


19 


460 


3 


15.5 


13 


5 


7 


400 


4 


24 


18 


8 


5 


702 


Dj 


260 


400 


500 


802 


1962 




Fig 2. Estimated posterior distributions of T from 10,000 samples. 



The posterior mean T, estimated from our samples by 

_ 1 ^ 

(2.12) T='£.[T\0,V]^-Y^T^9) 

9=1 

is very similar to the Furness solution reported in [12]. We list T along with 
95% credible intervals for each Tjj in Table 2. The credible intervals are 
wider than in our previous simple example due to the much higher number 
of feasible configurations in C(0, P). In fact, we estimate from the posterior 
samples that V{T = T\0,V) ^ F{T = f\0,V) 2 • 10"^ Since the most 
probable trip pattern accounts for only 0.2% of the posterior probability 
mass, we can conclude that even the Furness solution has little support from 
the data. Interval estimators now become more attractive representatives of 
the posterior space of trip configurations given a desired credibility level. 
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Table 2 

Posterior mean and 95% credible intervals. 



Zone 




1 




2 




3 




4 


1 


157.14 


[147, 169] 


97.37 


[85, 110] 


68.73 


[56, 81] 


76.75 


[64, 91] 


2 


58.70 


[48, 68] 


206.35 


[190,221] 


101.27 


[84, 116] 


93.69 


[79,91] 


3 


24.16 


[16, 33] 


44.91 


[33, 56] 


138.32 


[125, 151] 


192.61 


[177, 207] 


4 


20.00 


[12, 29] 


51.37 


[40, 64] 


191.68 


[172,211] 


438.95 


[418, 460] 



An even better alternative is to use the whole posterior distribution to 
propagate the randomness in T in our subsequent analyses. Consider, for 
instance, the mean regional cost 

c{T) = "^CijTij/T, 

and let us compare its posterior distribution, as induced by T, to the fixed 
value Cp — the mean prior regional cost — we set as a restriction in (2.5) to de- 
fine /3. Since /3 = 0.1, Cp = 8.51. We can now use our samples T^^\ ■ ■ ■ , T^'^^ 
from the Gibbs sampler to generate realizations 

(2.13) c(r(^)) = 5^c.,r(^Vr 

i,j 

and estimate P(c(T) {O,!)). Figure 3 shows a histogram based on {c{T^'^^)}. 
The estimated posterior mean cost is E[c{T)\0,'D] = c{T) = 8.67, the 
posterior mode cost — the Purness solution cost — is c(T) = 8.70, both higher 
than Cp, while a 95% credible interval for c(T) is [8.46, 8.88], barely covering 
Cp] moreover, 

1 ^ 

p(c(r) >Cp\o,v)^ _ ^/[c(r(^)) > Cp] = 0.93. 

9=1 

That a great proportion of possible trip patterns is spending more than 
previously expected strongly suggests that a lower value for f3 would be 
more realistic given the restrictions on T by O and T>. 

We might also want to analyse the trip length distribution (TLD) of the 
system: given a set of K cost ranges (cq, ci], . . . , {ck-i, ck], where < cq < 
ci < ■ ■ ■ < ck < oo, we bin the proportion of trips T^/T with costs in the 
k-tli range (cfc_i,Cfc] for each k = 1,... ,K. We again use our samples to 
generate an estimate for each T^: 

(2.14) r(^) = ^7;^)/{c., G(cfc_i,c,]}. 
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C 



Fig 3. Estimated posterior distribution of mean regional cost from 10,000 samples. Solid 
line indicates posterior mean, dashed line marks prior mean, and dash-dotted line marks 
posterior mode cost. 



Table 3 compares the mean posterior TLD with the prior TLD using aggre- 
gated range proportions {pfc}fc=i,...,/f, where = Y.i,jVijl{cij G (cfc-i,Cfc]}. 
Figure 4 represents both TLD with additional 95% credible intervals for each 
range. The discrepancy between prior proportions p and posterior propor- 
tions Tij/T is now more evident due to the structure in the TLD. In the 
next section we will propose a principled way to narrow the gap between 
these two regional features. 



Table 3 

Mean posterior TLD and prior TLD from proportions p. 



Range 


(0,4] 


(4,8] 


(8, 12] 


(12,16] 


(16,20] 


(20, 24] 


F.[Tk/T\0,V] 


0.18 


0.49 


0.08 


0.09 


0.11 


0.05 


Pk 


0.26 


0.38 


0.11 


0.13 


0.08 


0.04 



3. Extensions to the Proposed IVIodel. As we have seen in the last 
example in the previous section, prior beliefs might be deceptively outdated 
or based on regions that are not similar to the current study region. As a 
consequence, the related posterior distribution might be wrongly biased and 
scaled, affecting the estimation. In addition, it is possible that during the 
process of eliciting the prior proportions we realize that the trip structure 
in the region is uncertain as it might change during the study time frame 



16 



L. CARVALHO 



16-20 20-24 



Fig 4. Mean posterior TLD (bars) with 95% credible intervals (whiskers), and prior TLD 
(squares). 



due to, for example, seasonal effects. 

A natural approach is then to adopt our same viewpoint with respect to 
trip patterns and to explicitly quantify the uncertainty by regarding the 
proportions themselves as random, yielding a hierarchical model. Under 
this updated model the proportions p are now random and our samples 
from the last section are now conditional on p, that is, P(T | O, T>) becomes 
PCT" I P) Cj Nevertheless, we can still proceed in the same way we have 
done before if we integrate out the uncertainty in the nuisance parameters, 
the proportions, to obtain the marginal posterior distribution on the trips 

r, 

(3.1) p(r I o, p) = j p(r, p j o, v)dp. 

It is noteworthy that similarly to the previous posterior derivations, 

p(r, p I o, p) a p(o, V j r, p)P(r, p) = p(o, v \ t)v{t \ p)p(p), 

that is, we now simply condition T on p (compare with the numerator 
in (2.3)). The integral in (3.1) can be hard to evaluate directly, but we 
can again resort to Monte Carlo methods to sample from P(T| O^V) and 
conduct the inference, as we will see shortly. 

Even though a hierarchical model increases complexity, it has two main 
advantages. First, we can now explain the uncertainty in trip pattern struc- 
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ture by specifying a suitable probability distribution for p. This way, lack of 
information about trip pattern behaviors in the study region is reflected by 
more variability in the proportions, which, in turn, results in more dispersed 
trip pattern posterior distributions. 

Secondly, we can better incorporate additional data that are related to 
the trip pattern structure. For instance, if there is available preliminary data 
To — usually from a small scale study in the same region or from a region 
with very similar structure — we can seamlessly incorporate it in the inference 
through the posterior P(T | O, D, To)- This last posterior distribution can be 
obtained by adding the extra conditional on To in (3-1) and defining the 
likelihood P(To | p) to derive 

p(r, p\o,v, To) oc p(o, V, To I r, p)p(r, p) 
^ ' ^ = p(o, V I r)P(ro I p)P(r | p)p(p). 

Note that we make the reasonable assumption that T and To are condition- 
ally independent given p. 

An alternative, common approach is to assume that the proportions p are 
unknown, use To to estimate them, and then adopt the obtained estimate as 
if it were the "true" value of p; this approach is called empirical Bayes in 
the statistical literature, but is traditionally referred to as calibration in OD 
matrix estimation. Albeit being computationally simpler, this treatment has 
the drawback of underestimating variance, that is, it does not fully reflect 
the total uncertainty in the inference [10]. 

To better elucidate the proposed hierarchical models we present two ap- 
plications next. 

3.1. Incorporating seed matrices. A good candidate for the hyperprior 
distribution on p is the multinomial conjugate distribution, the Dirichlet 
distribution, p ^ Dir(7r), with mass function 



p(p)anp; 



We then have 



p(r, p I o, oc n ^ Up^r'^^ e c{o, V)] 
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A non- informative prior on p is attained by setting tt = (1, . . . , 1) which is 
equivalent to p having a uniform distribution over all {pij} G [0, 1]*^ such 
that "YliijPij = 1- this case, the expression for P(T, p | 0,T>,To) above is 
exactly the same as (2.3), but with the important distinction of now being 
a joint distribution since p is random. 

Suppose now that we have preliminary data To = {tij}i,j=i,...,n in the 
form of a seed matrix of trip counts. In the classical approach discussed 
in the introduction, To is commonly used to estimate the proportions as 
Pij = tij/To, where To = 'Ylki^ku or to simply kick-start an estimation 
procedure. This approach, however, effectively ignores the sample size To 
since pij remains the same if we observe n times more counts, kTq, even for 
K arbitrarily large; furthermore, similarly to empirical Bayes, it yields lower 
posterior variances for T. 

Following our discussion, here we offer a more principled way to incor- 
porate the seed matrix To by performing posterior inference on T through 
the distribution in (3.2). We assume that, similar to T~, the seed counts fol- 
low a conditional multinomial distribution. To ~ MN(ro,p) with flat prior 
P(ro) oc 1. Adopting the same Dirichlet distribution for p we have 



p(r, p I To) oc n ^ n 77 J\p:^'~'irr e c(o, V)] 



(3.3) 



T- I 



and thus p | T, To ~ Dir(7r + T + To)- 

To sample from P(T, p | 0,2?, To) we adopt an extended Gibbs sampler 
with an extra step that accommodates the new hierarchical level: we itera- 
tively sample from P(T | p, O, T)^ To) = P(T | p, O, P) exactly how we were 
doing in the previous section, and sample from the conditional Dirichlet 
P(p I T, 0, 15, To) = P(p I T, To). If a seed matrix is not available, the second 
step becomes simply sampling from P(p|T), still a Dirichlet distribution. 
The updated Gibbs sampler is listed below. 

Step 1. Start at some arbitrary initial configuration T'^^^ and initial propor- 
tions p('^\ 

Step 2. For t = 1,2,... do (until convergence): 

Step 2.1. For i, j = 1, . . . , n-1 do: sample T\f ~ Tij \ t\^~^'^ , p(*~i) , O, V 
from a non-central hyper geometric using a Metropolis step, 

7^) =M5(T5-);Og-),4-),Ag-),4-)), 
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with of- ^\ D-* ^ and A,^* as before, and 

Step 2.2. Sample p(*) ~ Dir(r(*) + + vr) or p(*) ~ Dir(r(*) + vr) 
if To is not available. 

To perform inference on the marginal posterior P(T | 0, 2^, To) we just 
need to use the realizations from the Gibbs sampler; the posterior mean, 
for instance, is readily available from (2.12). MAP estimates, however, are 
harder to obtain since we need to compute the integral in (3.1). One alter- 
native is to use the joint posterior mode. 



T=argmax< max P(T, p | O, I?, To) }• , 

TeC(o,D) [ pe[o,i]"":E,.,p«,=i J 

but then the estimate might be biased since it is conditional on the optimal 
value of p. In the same vein, we could first "calibrate" by setting some 
specific p, say the marginal posterior mean 



1 ^ 

p = E[p|o,p,ro]«-5^p(^), 



9=1 



and then produce 



(3.4) r = argmax P(r|p,0,P,ro). 

reC{o,D) 

It can be shown that the first estimator, T, can be obtained by an extended 
Furness method that iteratively solves for p while fitting the balancing fac- 
tors by setting 

Efe,i=l,...,n '^kl + + VTfc/ - 1 ' 

but we will not pursue it further here. 

3.2. Incorporating prior trip length distributions. Seed matrices provide 
information on each OD pair in the system and thus derive more accurate 
trip pattern inferences. More often than not, however, we do not have prelim- 
inary data To at this level of detail at our disposal. In some cases To contains 
censored observations; we might observe trips in a survey, but these trips 
are known only to have come from a certain origin, or to a destination, or 
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to have had some specific travel cost. For instance, recahing the trip length 
distribution (TLD) from Example 2, we might only discriminate a trip in 
our survey by specifying its cost "bin", that is, within which range its cost 
falls. 

Assume that we know the OD trip costs {cij} and consider, as before, the 
K cost ranges (cq, ci], . . . , {ck-i, ck\- Our preliminary counts now fall into 
K possible strata. To = {^i, • • • , ii^}, depending on their transport costs: we 
observe ti trips with costs between cq and ci, t2 trips spending between and 
ci and C2, and so on. If we again define range proportions aggregated by 
cost po = {pk}k=i...,K, where pk = ^ijPijHcij G (cfe_i,Cfc]}, we can then 
analogously set To | p MN(To,po) with P(To) oc 1 as the preliminary data 
likelihood. We note that po is a function of p. 

We can assume the same Dirichlet distribution for the proportions, p ~ 
Dir(7r), but since 




and each p^ is a sum of pij for all pairs i and j with cost in the k-th bin, we 
lose the conjugacy. Another approach, in case we are more informed about 
the censored proportions, is to opt for a Dirichlet prior on po; but then 
we again lack conjugacy. Regardless, we can still obtain a Gibbs sampler 
that is very similar to the scheme shown in the previous subsection; we just 
need to substitute the direct Dirichlet sampling step. Step 2.2, by another 
Metropolis step. Next, we provide an updated sampling scheme in a simpler 
context. 

Suppose that the proportions follow a gravity model withpij oc exp(— /3cjj), 
as in the previous section, but now we make /3 random to drive the uncer- 
tainty in p. Moreover, we settle on a Dirichlet prior on po, Po(/3) ~ Dir(7r), 
where tt = {tti, . . . , ttk}- In what follows we explicitly represent the depen- 
dency of the proportions on /3 for clarity; we also note that now 

Pk J^exp(-/3ci,)/{ Cij G (cfc_i,Cfc]}. 
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The joint posterior is thus given by 
(3.5) 

p(r, /3 1 o, V, To) oc n ^^4^ n ^-^Up^if^r^'^nr e c{o, V)] 

tj-'k k 



cx 



Ylp^m^'' Up'^ii^)''^'"''' nr G cio,v)]. 



From (3.5) we deduce that setting tt = {1, . . . , 1} for a non-informative 
Dirichlet prior is equivalent to having a flat improper prior for the cost 
deterrence, P(/3) oc 1. 

The Gibbs sampler has two iterative steps: we alternate between sampling 
from T conditional on the impedance j3 and all the data, P(T|/3,C',I5,7o), 
and sampling from /3 conditional on trip patterns T and margins and prelim- 
inary data, P(/3 | T, O, V, To)- We already know, since Section 2, how to sam- 
ple from P(T I /3, O, I^, To) = P(T | p(/3), O, V) using random walk Metropo- 
lis steps for the non-central hypergeometric. To sample from P(/3 | T, O, V, To) 
we construct another random walk Metropolis step. 

First, let us define the normalizing factors Zk{l3) = j exp{—(3cij)I{cij G 
{ck-i,Ck]} and Z{P) = Eij exp(-/3cij) = Xlfc -^fc(/3), so that = exp{-f3cij)/Z{P) 
andpk = Zk{l3)/Z{l3). Also, recafl that T = Ylij^^ij^ = Y^k^k, and define 
^0* = Efc(ifc + ^fc - 1) = To + Efc ^fc - K. The function $(/3; T, To) in the 
joint posterior (3.5) then simplifies to 



Zk{P)\ 



\ m I 

= exp <^ - /? ^ CijTij + ^{tk vTfc - 1) log Zfc(/3) 

I i,j k 

-(r + ro*)iogz(/3)|. 

As proposal distribution, let us select a normal distribution centered at the 
current realization of /3 in the chain with small variance a . To get at 
the t-th iteration we then sample a candidate /3* ~ N a"^) and accept 
or reject it based on the acceptance ratio 

(3.6) T?(/3(-),r)- ^^^^'^'^''^'^"^ - ^^^-^^'~''^^t'') 



p(/3W |r,o,p,ro) $(/3(*-i);r(*-i),rJ 
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The final, updated Gibbs sampler is listed below. 

Step 1. Start at some arbitrary initial configuration T^'^^ and initial impedance 

Step 2. For t = 1, 2, ... do (until convergence): 

Step 2.1. For i,j = 1,..., n-1 do: sample T^^ ~ T^j \ Tjg^^^ p(/3(*-i)), O, V 
from a non-central hypergeometric using a Metropolis step, 

= MS{T^^-'^ ; Og-^) , 4"^) , Ag-^) , ^|;,, . 

with 

^'^^^ ^ P»(/3(*-i)K,(/3(*~i))- 

Step 2.2. Sample candidate /3* ~ N{l3^^-^\a^) and set = /?* 
(accept) with probability min{l, i?(/3(*~^) , /?*)} where i?(-) 
is the ratio in (3.6); otherwise, set = /3(*~^) (reject.) 

Example 2, revisited. Under the same setting of Example 2, but now 
with /3 random, let us initially set 7r = {l,. ..,!}, that is, a non- informative 
prior on /3. We run a Gibbs sampler with proposal variance cr^ = 10~^ until 
convergence and take G = 10,000 samples for posterior inference. 

Our estimate for /3, ^5 = E[/3 | O, P] « ^ Ylg=i P^^^ = 0-031, is much lower 
than the assumed value in Example 2 (/3 = 0.1), which corroborates with our 
previous remark about a more realistic value for the cost impedance. Such 
lower values are expected since the inference is solely driven by the observed 
data and thus better represents the margin constraints. The estimated 95% 
credible interval for /3 is large, [0.009,0.056], reflecting the high degree of 
uncertainty that arises from trying to capture the structural trip proportions 
using a single parameter. 

The effect of a random (3 in trip patterns can be appreciated in the esti- 
mated marginal posterior distributions for T pictured in Figure 5. We draw 
attention to the increased spread when compared to the distributions in 
Figure 2. We also observe that the Furness solution, conditional on (3 and 
represented by squares, is similar to the posterior mean E[T| 0,V\. 

The higher variability in T is reproduced by wider credible intervals in 
the trip length distribution, as shown in Figure 6: each bar represents the 
estimated posterior mean of T^/T for each cost range, the squares pinpoint 
the posterior mean of Pki/3), while the dotted line corresponds to the prior 
mean 1 /K. As can be seen, the dependence of the proportions on a single pa- 
rameter makes the distribution on p not flexible enough to follow T closely. 
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Fig 5. Estimated marginal posterior distributions for T from hierarchical model with non- 
informative prior on p. Squares mark conditional Furness solution. 
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Fig 6. Mean posterior TLD (bars) with 95% credible intervals (whiskers) , and mean pos- 
terior TLD proportions (squares). The dotted line marks the prior mean, 1/K. 
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We note again the higher variabihty in the posterior TLD as assessed by the 
wider 95% credible intervals (whiskers) when compared to Figure 4. 

Suppose now that we observe preliminary data To from [12, pg. 186] in 
Table 4. Keeping the flat prior on /3 and = 10""^, we perform posterior 
inference from 10,000 samples taken from the Gibbs sampler after conver- 
gence. 



Table 4 

Preliminary TLD. Data reproduced from [12, table 5.14]. 



Range 


(0,4] 


(4,8] 


(8, 12] 


(12, 16] 


(16, 20] 


(20, 24] 




365 


962 


160 


150 


230 


95 


tk/To 


0.19 


0.49 


0.08 


0.08 


0.12 


0.05 



The preliminary TLD counts are very informative, Tq = T = 1962, and 
greatly affect the inference: our updated estimate for the cost deterrence 
is a higher /3 = E[/3|0,P, To] = 0.086, closer to the original (3 = 0.1 in 
Example 2, and the 95% credible interval for /3 is much tighter, [0.086, 0.093]. 

The posterior inference on trip patterns is summarized by Table 5, show- 
ing posterior mean T and marginal 95% credible intervals, and Figure 7. 
The marginal distributions have increased variability when compared to 
Example 2 due to the randomness in the proportions, as expected. The 
variance is, however, not much higher since the preliminary TLD is very 
informative. The conditional Furness solution T, shown in square marks 
in Figure 7, is very similar to the posterior mean. The estimated poste- 
rior probabilities of these solutions are F{T\ f3,0,V,To) = 1.3 • 10~^ and 
F{f\^,0,'D,To) = 1.5 • 10"^, slightly smaller than in Example 2. 



Table 5 

Marginal posterior mean and 95% credible intervals. 



Zone 




1 




2 




3 




4 


1 


141.34 


[128, 155] 


101.49 


[87, 118] 


71.11 


[57, 85] 


86.07 


[71, 103] 


2 


63.87 


[52, 76] 


184.96 


[168, 204] 


106.10 


[89, 120] 


105.07 


[90, 122] 


3 


28.47 


[20, 37] 


51.32 


[39, 63] 


131.06 


[116, 146] 


189.14 


[172, 205] 


4 


26.31 


[17,37] 


62.23 


[48, 77] 


191.73 


[174, 209] 


421.72 


[400, 444] 



Since /? < 0.1 with high posterior probability, we should expect the system 
to spend more when compared to the scenario in Example 2. Figure 8 dis- 
plays the posterior distribution of trip costs c(T), as estimated from (2.13). 
The posterior mean regional cost c(T) = E[c(T) \ 0,V,To] is 9.12, with a 
95% credible interval of [8.81,9.45], higher than before. The posterior mode 
cost cCT) is 9.09, close to c(T), as expected since the estimates are similar. 
The proportion cost Cp{f3) = CijPij{/3) in (2.5) inherits the randomness 
from /3; its estimated posterior mean, 8.95, is lower than c(T), which can 
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Fig 7. Estimated marginal posterior distributions for T from hierarchical model. Squares 
mark conditional Furness solution. 

also be attributed to the rigidness in p. 

Finally, we can also see the effect of To in reducing the inferential uncer- 
tainty in the posterior TLD at Figure 9, as illustrated by the tighter 95% 
credible intervals. We still see the discrepancy between the posterior TLD — 
whose mean E[Tfc/r | O, P, To] is represented by bars — and the posterior 
proportion TLD — whose mean E[pfc(/3) | O, P, To] is identified by squares. 
We note, however, that the posterior mean TLD is close to the prior mean 
TLD, tk/Tf), represented by diamonds and listed in Table 4, since To is highly 
informative and thus influential. The two mean posterior TLD are listed in 
Table 6. 



Table 6 

Posterior mean trip length distributions based on T and p. 



Range 


(0,4] 


(4,8] 


(8, 12] 


(12, 16] 


(16,20] 


(20, 24] 


E[Tk/T\,0,V,To] 


0.17 


0.48 


0.08 


0.09 


0.12 


0.06 


E[pkil3)\,0,V,To] 


0.24 


0.36 


0.12 


0.14 


0.10 


0.04 


4. Discussion. 


Static 


origin- 


destination 


matrix 


estimation 


has been 



traditionally regarded as an optimization problem. Here we draw from the 
contingency table literature and cast OD matrix estimation as a formal 
statistical inference problem and adopt a Bayesian approach where trip pat- 
terns are considered random. Furthermore, we make model assumptions on 
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Fig 8. Estimated posterior distribution of mean regional cost. Solid line indicates poste- 
rior mean, dashed line marks posterior mean proportion cost, and dash-dotted line marks 
posterior mode cost. 
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Fig 9. Posterior mean TLD (bars) with 95% credible intervals (whiskers), posterior mean 
proportion TLD (squares), and prior mean TLD (diamonds). 
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the parameters describing the probability distribution on trip patterns — 
trip proportions that govern the structure of trip distribution — as opposed 
to the classical assumptions on particular objective functions. The use of 
trip proportions frees us from requiring seemingly artificial constraints on 
trip configurations, provides more easily interpretable results, and allows 
us to better incorporate other sources of data in a principled way within a 
Bayesian framework. 

By electing specific functional forms for the trip proportions — as based on 
the entropy maximizing principle, for example — we are able to recover clas- 
sical solutions as MAP estimators and thus inherit the justifications and rich 
history behind traditional approaches. Yet, perhaps the main benefit of our 
proposed approach is to better characterize the uncertainty in the solutions 
and, in general, in trip distribution. As we have showed in many examples, it 
is common for any point estimate — such as the Furness solution or posterior 
mean — to capture only a small fraction of possible trip configurations given 
the large number of alternatives. Point estimators, when seen as ensemble 
summarizers, can be useful for preliminary planning purposes and gaining 
insight on the trip distribution in the study region; they can, however, be 
poor substitutes of the full posterior distribution in further analyses as they 
can dramatically underestimate the variability in trip patterns. 

Preliminary data is traditionally used to calibrate specific parameters 
of the trip distribution model, such as cost deterrence. Nonetheless, fixing 
an optimal data fitting value for the parameter can further underestimate 
variance in the inference. In our fully Bayesian approach we explicitly ac- 
knowledge the uncertainty in the parameters by also making them random: 
we set a hyper-prior distribution on trip proportions to build a hierarchical 
model. As a consequence, and in contrast with a traditional approach, more 
informative preliminary data — for example, high counts in a seed matrix — 
yield more precise inference on trip configurations as we are able to more 
accurately characterize trip proportions. 

The adoption of a Bayesian framework carries many other benefits not 
covered here: besides point and interval inference, we are also able to test 
hypotheses by explicitly comparing models through Bayes factors; more- 
over, Bayesian methods can be further explored to perform model validation 
through posterior predictive checks. In summary, the flexibility of Bayesian 
statistics is particularly helpful and really comes to bear when exploring 
high-dimensional spaces such as the ensemble of feasible trip configurations. 

There is, however, a price to pay for such modeling power in higher com- 
putational costs, and thus the procedures discussed here still need to be 
more closely examined in this respect. Specifically, the increased complex- 
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ity in generating and analysing trip configuration samples instead of simply 
obtaining the most likely trip assignment needs to be assessed as the pro- 
posed routines are tried in real- world datasets comprising large systems. 
Future directions would also include the development of more efficient sam- 
pling schemes through improved algorithms — better proposal densities, for 
example — and faster implementations that would explore, for instance, par- 
allel versions of the proposed procedures. 

Finally, it should be noted that the models proposed here can serve as 
basis for an integrated higher level model that incorporates other traffic 
modeling steps; as an example, the effect of congested networks could be 
considered in OD matrix estimation if our model would jointly consider trip 
distribution and route assignment. As it is common in Bayesian modeling, 
we would then be able to propagate the uncertainty across steps while per- 
forming marginal inference on any aspect of the higher model conditional on 
data from all steps. Furthermore, other types of data could also be consid- 
ered to obtain more refined models with, for instance, link count data and 
camera sensors or temporal variation for dynamic OD matrix estimation. 
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